Estimating the Strength of Selective Sweeps from Deep Population Diversity Data
نویسندگان
چکیده
Selective sweeps are typically associated with a local reduction of genetic diversity around the adaptive site. However, selective sweeps can also quickly carry neutral mutations to observable population frequencies if they arise early in a sweep and hitchhike with the adaptive allele. We show that the interplay between mutation and exponential amplification through hitchhiking results in a characteristic frequency spectrum of the resulting novel haplotype variation that depends only on the ratio of the mutation rate and the selection coefficient of the sweep. On the basis of this result, we develop an estimator for the selection coefficient driving a sweep. Since this estimator utilizes the novel variation arising from mutations during a sweep, it does not rely on preexisting variation and can also be applied to loci that lack recombination. Compared with standard approaches that infer selection coefficients from the size of dips in genetic diversity around the adaptive site, our estimator requires much shorter sequences but sampled at high population depth to capture low-frequency variants; given such data, it consistently outperforms standard approaches. We investigate analytically and numerically how the accuracy of our estimator is affected by the decay of the sweep pattern over time as a consequence of random genetic drift and discuss potential effects of recombination, soft sweeps, and demography. As an example for its use, we apply our estimator to deep sequencing data from human immunodeficiency virus populations.
منابع مشابه
DETECTING SELECTIVE SWEEPS: A NEW APPROACH BASED ON HIDDEN MARKOV MODELS Authors and affiliations
Detecting and localizing selective sweeps based on SNP data has recently received considerable attention. Here we introduce the use of Hidden Markov Models (HMMs) for the detection of selective sweeps in DNA sequences. Like previously published methods, our HMMs use the site frequency spectrum, and the spatial pattern of diversity along the sequence, to identify selection. In contrast to earlie...
متن کاملEvaluating the ability of the pairwise joint site frequency spectrum to co-estimate selection and demography
The ability to infer the parameters of positive selection from genomic data has many important implications, from identifying drug-resistance mutations in viruses to increasing crop yield by genetically integrating favorable alleles. Although it has been well-described that selection and demography may result in similar patterns of diversity, the ability to jointly estimate these two processes ...
متن کاملDetecting selective sweeps: a new approach based on hidden markov models.
Detecting and localizing selective sweeps on the basis of SNP data has recently received considerable attention. Here we introduce the use of hidden Markov models (HMMs) for the detection of selective sweeps in DNA sequences. Like previously published methods, our HMMs use the site frequency spectrum, and the spatial pattern of diversity along the sequence, to identify selection. In contrast to...
متن کاملDetecting bottlenecks and selective sweeps from DNA sequence polymorphism.
A coalescence-based maximum-likelihood method is presented that aims to (i) detect diversity-reducing events in the recent history of a population and (ii) distinguish between demographic (e.g., bottlenecks) and selective causes (selective sweep) of a recent reduction of genetic variability. The former goal is achieved by taking account of the distortion in the shape of gene genealogies generat...
متن کاملSelective sweeps for recessive alleles and for other modes of dominance.
A selective sweep describes the reduction of linked genetic variation due to strong positive selection. If s is the fitness advantage of a homozygote for the beneficial allele and h its dominance coefficient, it is usually assumed that h=1/2, i.e. the beneficial allele is co-dominant. We complement existing theory for selective sweeps by assuming that h is any value in [0, 1]. We show that gene...
متن کامل